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ABSTRACT 

Prior research on learning has been linked to 
instruction by the derivation of general principles of instructional 
design from learning theorxes. However, such design principles are 
often difficult to apply to particular instructional issues. A new 
method for relating research on learning to instructional design is 
proposed: Different ways of teaching a particular topic can be 
evaluated by teaching that topic to a simulation model of learning 
and recording the complexity of the resulting learning processes. A 
study to compare two mathematically correct algorithms for computing 
the difference between two multi-digit numbers from a conceptual or 
mechanical perspective was designed for both methodological and 
substantive purposes. The algorithms chosen to model were 
"regrouping" and "augmenting". Explanations of the architecture of 
simulation system production are provided. Learning difficulty is 
determined by the number of states and cycles that the simulation 
system carries out to learn the method carried out over all the 
training problems. Results of the learning runs imply that regrouping 
is more difficult that augmenting, and that learning subtraction 
conceptually is more difficult than learning it mechanically, a 
conclusion that would seem to contradict widely held beliefs in the 
mathematics education community. The presuppositions that accurate 
simulation models can be developed are discussed and the advantages 
and disadvantages of the general method of simulation use are 
evaluated. (MDH) 
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Knowledge and Understanding in Human Learning 



Knowledge and Understanding in Human Learning is an umbrella 
term for a loosely connected set of activities lead by Stellan Ohlsson 
at the Learning Research and Development Center, University of 
Pittsburgh. The aim of KUL is to clarify the role of world knowledge 
in human thinking, reasoning, and problem solving. World 
knowledge consists of concepts and principles, and contrasts with 
facts (episodic knowledge) and with cognitive skills (procedural 
knowledge). The long term goal is to answer six questions: How can 
the concepts and principles of particular domains be identified? 
How are concepts and principles acquired? How can the acquisition 
of concepts and principles be assessed? How are concepts and 
principles encoded in the mind? How are concepts and principles 
utilized in performance and learning? How can instruction facilitate 
the acquisition and utilization of concepts and principles (as opposed 
to episodic or procedural knowledge)? Different methodologies are 
used to investigate these questions: Psychological experiments, 
protocol studies, computer simulations, historical studies, semantic, 
logical, and mathematical analyses, instructional intervention 
studies, and so on. A list of KUL reports appear at the back of this 
report. 
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Abstract 



In the past, research on learning has been linked to instruction by the 
derivation of general principles of instructional design from learning 
theories. But such design principles are often difficult to apply to 
particular instructional issues. A new method for relating research on 
learning to instructional design is proposed: Different ways of teaching 
a particular topic can be evaluated by teaching that topic to a 
simulation model of learning and recording the complexity of the 
resulting learning processes. An application of this method to a 
traditional problem in mathematics education suggests that conceptual 
instruction in arithmetic causes more cognitive strain than mechanical 
instruction, contrary to a widely held belief in the mathematics 
education community. The advantages and disadvantages of the general 
method are discussed. 



Keywords: Arithmetic, augmenting, computer simulation, instructional 
design, learning theory, regrouping, subtraction, understanding 
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On the Relation Between Learning Theory and Instruction 

Instruction is an artefact, a social practice deliberately designed to 
achieve a particular purpose. A theory of instruction is therefore a 
prescriptive theory. The task of such a theory is to state principles that 
constrain search through the space of instructional designs [30]. A 
theory of learning, on the other hand, is a descriptive theory. The task of 
a learning theory is to state principles that accurately describe the 
mechanisms of cognitive change. Instructional theory and learning 
theory are distinct intellectual enterprises, just as agriculture and 
botany, medicine and physiology, engineering and physics are distinct 
enterprises [10, 12]. 

As these analogies suggest, the enterprises of instruction and 
learning, although distinct, are closely related. Physical therapies that 
ignore the chemistry and physiology of the human body are likely to do 
the patient more damage than good; machines that violate the laws of 
physics cannot work. Similarly, instructional designs that are not in 
accord with the mechanisms of cognitive change are unlikely to 
facilitate learning. 

The notion that a theory of instruction should be informed by a 
theory of learning is hardly controversial when stated abstractly. Glaser 
traces this idea back to both John Dewey and Edward L. Thorndike [10], 
but there are many recent advocates [13, 32, 34, 35]. But how, 
specifically, are the two enterprises supposed to interact? How can 
instructional designs be informed by principles of learning? The 
traditional method for applying learning theory to instructional 
questions is to derive general principles of instruction from general 
principles of learning; the application of the derived principles to the 
design of instruction in a particular topic is left to the designer. The 
first systematic application of this method was launched by the 
behaviorists. Principles of stimulus-response relations and 
reinforcement gave rise to instructional principles that emphasized 
behavioral objectives and maximally efficient reinforcement schedules 
[11 J. The application of piagetian research to instructional questions 
has taken a similar form: The principle that equilibrium requires a 
balance between assimilation and accomodation has given rise to 
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training programs that deliberately induce disequilibrium in order to 
accelerate cognitive change [22]. David Ausubei's theory of learning as 
successive elaboration gave rise to Reigeiuth's theory of instructional 
design [32]. In each approach, general principles of instruction are 
derived from general principles of learning, but the application of those 
design principles to particular instructional topics is based on 
intuition, common sense, and seat-of-the-pants judgments. 

Modern cognitive psychology, based on information processing 
concepts, has surpassed past approaches with respect to the power of 
its theories, and with respect to the depth and the detail of its 
descriptions of cognitive processes. But its application to instructional 
questions has so far taken the same old form: General principles of 
instructional design are derived from general principles of learning; the 
application of those principles to particular instructional designs is 
left to the designer. For example, the principles of the ACT* theory [1] 
have given rise to several instructional principles, incl jding that one 
should teach the goal tree for cognitive skills [2]. This principle is 
surely correct, but its application to a particular instructional topic is 
nevertheless problematic. How is this principle to be applied, for 
example, in the teaching of arithmetic? Should one teach the entire goal 
tree for subtraction .vith regrouping to all students, even to very young 
students? Are there no situations in which the complexity of the goal 
tree might be an obstacle to learning? Should the entire goal tree be 
taught at once, or should one introduce it component by component? if 
so, how should the components be sequenced? The general principle does 
not, by itself, answer instructional questions of this detailed sort. 

This chapter explores a different approach to the interaction 
between the theory of learning and the theory of instruction. Instead of 
deriving general principles of instruction from a learning theory, this 
approach exploits the fact that information processing theories of 
learning can be embodied in runnable simulation models to answer 
particular instructional questions. A common and important type of 
instructional problem-perhaps the only type-is to decide between 
alternative ways of teaching a particular topic. Problems of this type 
can be solved, I suggest, by teaching the relevant topic to a simulation 
model of learning. To compare two ways of teaching a particular topic, 
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we teach that topic to the learning model in both ways, and we measure 
the computational complexity of the learning processes induced in the 
two cases. If the simulation model expends less computational work to 
learn under one form of instruction than under another, then it predicts 
that the former is preferable to the latter. The main purpose of this 
chapter is to present an application of this method to a traditional issue 
in arithmetic instruction. 

The method of teachable simulation models has three prerequisites. 
First, it requires a runnable model. So-called information processing 
models that consist of labelled boxes with arrows of varying thickness 
going in and out of them are of no help; neither are computer models 
with such shaky implementation that they can barely produce a single 
demonstration run without breaking; neither are programs that only 
embody some of the assumptions of the underlying theory (while the 
other assumptions are embodied in some other program). The method of 
teachable simulation models requires a robust, integrated computer 
model that can be run on a variety of inputs. Second, the method requires 
that the simulation model is capable of learning. A performance model 
is not enough. Third, the learning mechanisms of the model must be such 
that their inputs can be interpreted as instruction. A model of learning 
by doing is not enough; the method requires a model of learning from 
declarative messages that originate in an outside source. The HS model 
described below satisfies these three prerequisites. 

The particular instructional question investigated in this chapter 
concerns the teaching of arithmetic. The question of how to teach an 
arithmetic skill like subtraction has been approached in different ways 
by different generations of researchers. An earlier generation focussed 
on the question of which subtraction algorithm is easier for children to 
learn. Large scale empirical research programs were launched to answer 
this question [5, 6]. The answer was, briefly summarized, that the 
method of regrouping (or "decomposition") is easier to learn than the 
method of augmentation (or "equal addition"), at least when subtraction 
is taught conceptually (as opposed to mechanically). I show in this 
chapter that the method of teachable simulation models implies a 
different answer to this question. 
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The current generation of researchers in mathematics education 
focusses on the contrast between rote and insightful learning of 
arithmetic algorithms. They strive to find methods that facilitate 
school children's acquisition of the conceptual rationale *or arithmetic 
algorithms, in the hope that conceptual understanding will eliminate 
errors, improve retention, and faciliate transfer to unfamiliar problems 
[15]. The method of teachable simulation models leads me to a rather 
contrary answer to this question. 

In summary, ihe present chapter has both a methodological and a 
substantive purpose. I propose a general method that exploits the fact 
that information processing theories of learning can be embodied in 
runnable simulation models to answer particular instructional 
questions. The method is introduced in the context of a particular 
application. The application is not merely a demonstration of the 
method. The specific conclusions reached have important implications 
for instruction in arithmetic. 

Regrouping versus Augmenting 

There are several mathematically correct algorithms for computing 
the difference between two multi-digit integers. Educational 
researchers at the beginning of this century asked whether one of these 
algorithms is easier to learn than the others, a very reasonable 
question. In the regrouping algorithm non-canonical columns, i. e., 
columns in which the minuend digit is smaller than the subtrahend digit, 
are dealt with by incrementing the relevant minuend digit with one 
place-value unit. To keep the value of the minuend constant, this change 
in the minuend is compensated by decrementing the first non-zero 
minuend digit with a higher place value than the incremented digit. In 
the augmenting algorithm non-canonical columns are also dealt with by 
incrementing the minuend digit, but in this case the change in the 
minuend is compensated by incrementing the subtrahend digit with the 
next higher place value. (Strictly speaking, the entities which are 
incremented and decremented are the numbers which the digits refer to. 
Since no ambiguity results, I use the somewhat inaccurate locution 
•decrementing a digit" instead of the accurate but tedious "decrementing 
the number a particular digit refers to" .) 
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Which algorithm is easier? 

The regrouping and augmenting algorithms build on different 
mathematical ideas. The regrouping algorithm is based on the 
associative law 

(a + b) + c - a + (b + c). 

The associative law implies that the value of the minuend remains 
constant through the regrouping operation. (A complete derivation of the 
regrouping algorithm from first principles is available in [25]. > The 
augmenting algorithm, on the other hand, is based on the constant 
difference law 

a - b ■ (a + k) - (b + k). 

This law implies that the difference between the minuend and the 
subtrahend remains constant through the augmenting operation. (A more 
detailed discussion of the rationale for the augmenting algorithm is 
available in [8].) Since the two algorithms build on different 
mathematical ideas, it is entirely plausible that one of them is easier 
to learn and/or to execute than the other. 

Large-scale classroom studies were performed in the early decades 
of this century in an effort to settle this issue empirically. William 
Brownell concluded; "Even a cursory survey of the ... experimental 
results ... reveals the impossibility of deciding simply and finally 
between D [the regrouping method] and EA [the equal addition method] as 
the better procedure for teaching 'borrowing'" [5, p. 169]. Augmenting 
was found to be easier than regrouping more often than the other way 
around, but the observed difference was small in magnitude. Brownell 
argued that the results were only in favor of augmenting when 
subtraction was taught as a mechanical performance. If subtraction was 
taught conceptually, he claimed, the results favored regrouping [5, 6]). 
Browneil's argument was widely accepted and politically instrumental 
in settling the issue in favor of teaching the regrouping method in 
American schools. Educators in other nations were not equally 
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convinced, and the augmenting method is still taught in some European 
schools. 

The empirical studies did not clearly distinguish between 
performance and learning. They confused the question which algorithm 
is easier to use? with the question which algorithm is easier to learn? 
One reason for the lack of separation of these two questions is that pure 
measures of learning are hard to come by. We can only observe by 
recording performances, so most empirical measures will confound the 
two questions. In the context of a simulation model, the two questions 
can be cleanly separated. This section investigates which algorithm is 
easier to use, while the next section investigates which algorthm is 
easier to learn. 

In information processing terminology, the question of which 
algorithm is easier to use can be reformulated as follows: What is the 
relation between the cognitive complexity of the mental procedure 
corresponding to the regrouping algorithm and the cognitive complexity 
of the procedure corresponding to the augmenting algorithm? This 
question can be answered by implementing the two algorithms as 
psychologically plausible simulation models, run those models, and 
measure their relative complexity. 

Simulating regrouping and augmenting 

The hypothesis that cognitive skills (mental procedures) are 
encoded as production systems was first proposed by Allen Newell 
and Herbert A. Simon [23], and has been adopted by a number of 
researchers [1, 18, 19]. According to the production system 
hypothesis, cognitive skills are encoded in sets of production rules, 
where each production rule has the general form 

Goal + Situation --> Action. 

The symbol "Goal" stands for a specification of a desired situation, 
"Situation" stands for a description of the relevant features of the 
current situation, and "Action" refers to something the person 
knows how to do. The intended interpretation of such a rule is that 
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when the person has the specified goal, and he or she is in a 
situation that fits the situation description, then he or she will 
consider the specified action. A collection of interrelated 
production rules is a production system. Each cognitive skill is 
hypothesized to correspond to a production system. 

A production system architecture is a program that can 
interpret a production system. In this context, to interpret means to 
(a) decide which production rules (in a particular production 
system) are satisfied in the current situation, (b) select one or 
more rules to be evoked, and (c) execute the actions of the evoked 
rules. Each pass through the three steps (a)-(c) is one production 
system cycle, or operating cycle. The number of cycles required to 
execute a production system is one of the measures of cognitive 
complexity used in this chapter. 

The satisfied rules are identified by matching the Situation 
against the so-called working memory, a data, base which contains 
the system's information about the current state of affairs, and by 
matching the Goal against the system's current goal. If both 
components match, the rule is satisfied and is therefore a candidate 
for being evoked. Selecting which rules to evoke is sometimes 
called conflict resolution [21]. A typical conflict resolution scheme 
is to select those rules that match against the most recent 
information in working memory. Execution of the primitive actions 
must involve calls on motor programs that control the muscles of 
the relevant limbs, e. g., the finger muscles for the action of 
writing a digit, but production system theories do not have much to 
say about this aspect of human cognition. 

The HS architecture is a relatively standard production system 
architecture. It has a single working memory which contains 
information about both the current state of affairs, and the 
systems' current goal(s). All available rules are matched against 
working memory in each operating cycle. There is no conflict 
resolution. Every satisfied rule is evoked. There is no complexity 
limitation on the left-hand side of the rules, but the right-hand side 
(the action part) is limited to a single action. The system continues 
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to match and evoke rules until either there are no satisfied rules, or 
the current problem is solved. Detailed descriptions of the HS 
architecture are available in [28, 29]. 

Table 1. The distribution of production rules in two 
canonicalization algorithms. 



Rule type Regrouping Augmenting 

Visual 4 3 

Motor 11 11 

Write & cross out 6 6 

Say answer 5 5 

Cognitive 20 1 7 

Create expressions 11 12 

Revise expressions 9 5 

Memory 3 4 

All rules 38 35 



In order to simulate subtraction with regrouping, the HS system 
was extended with a (simulated) task display and a (simulated) 
visual-motor interface consisting of an eye and a hand. The task 
display is a data structure in the computer which contains the same 
information as a piece of paper with a subtraction problem written 
on it. Technically speaking, the task display is a two-dimensional 
array of digits. (I am assuming that the subtraction problem is 
written in vertical format.) Information about the task display 
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enters into the working memory of the HS system through a 
simulated eye, a program module which can only access one digit at 
a time. When the simulated eye 'looks' at a digit, information about 
that digit is entered into working memory. In order to gather 
information about some other digit, the eye has to be moved. The 
eye can move left, right, up, and down. Eye movements are distinct 
computational steps, so control of visual attention is encoded in 
production rules. The model can alter the external task display only 
through the use of a simulated hand. The hand can cross out an 
existing digit and write a digit in a blank space. These two 
primitive actions count as distinct computational steps, so the hand 
is also controlled by procuction rules. In short, the model simulates 
subtraction at the level of individual eye movements and individual 
writing actions, a very fine-grained level of analysis compared to 
most simulation models. 

HS was also equipped with a long-term memory for number 
facts, e. g., 8 - 7 b 1. Retrieval of number facts was simulated with 
a function which returns the (correct) answer to any query about 
relations between two numbers. HS does not simulate the 
probabilistic nature of memory retrieval, nor the existence of 
incorrect number facts. Like attention allocation and writing, 
memory retrieval is a distinct computational step which is 
controlled by production rules. 

The HS models of regrouping and augmenting consist of 38 and 
35 production rules, respectively. The number of different rules in 
different categories are shown in Table 1. The distribution of rules 
over visual steps (i. e., move the eye), motor steps (i. e., write, 
cross out, and say the answer), cognitive steps (i. e., the creation 
and revision of working memory expressions), and memory steps (i. 
e., retrievals from long-term memory) is approximately the same 
for both models. The details of the rules themselves are not 
important for present purposes. Examples of complete production 
rules are available in [8]. 

In order to estimate the cognitive complexity of the two 
subtraction algorithms, the two simulation models were run on a 
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subtraction test consisting of 66 subtraction problems which 
varied with respect to number of columns, number of non-canonical 
columns, and number of blocking zeroes, i. e., zeroes immediately to 
the left of a non-canonical column (or another blocking zero). The 
number of production system cycles required by each model to 
complete each problem was recorded. In addition, each cycle was 
classified with respect to the type of rule that was evoked in that 
cycle. 

The results are shown in Figure 1. The figure shows the 
cognitive complexity of the regrouping and augmenting algorithms 
on eleven different problem types. Problem types 1-4 have two, 
three, four, or five canonical columns, respectively, but no non- 
canonical columns. The number of cycles required to complete such 
problems is the same for both models. Problem types 5-8 have one, 
two, three, or five non-canonical columns, respectively. The 
regrouping model requires more steps to handle each such column 
than the augmenting model. The difference is small in magnitude. 
The difference is located entirely in the visual-motor interface, i. 
e., the regrouping algorithm requires more cycles because it 
involves more complicated attention allocation. 

Problem types 9, 10, and 11 have one, two, or three blocking 
zeroes, respectively. (A blocking zero is immediately to the left of 
a non-canonical column or another blocking zero.) The regrouping 
model has a slight advantage on these problem types. The reason is 
that once a set of columns have been traversed by the regrouping 
procedure, no further regrouping of those columns is needed. The 
au&nenting algorithm, on the other hand, has to augment every 
column with zero as the subtrahend digit and a non-zero minuend 
digit. Consequently, if there are several blocking zeroes in a 
problem, the regrouping algorithm completes that problem in 
slightly fewer operating cycles than the augmenting algorithm. Once 
again, the difference is small in magnitude. A more extensive 
discussion of these results is available in [8]. 
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Figure 1. The number of production system cycles required to 
execute the regrouping and augmenting algorithms in eleven 
different problem types. The regrouping bar is to the right and the 
augmenting bar to the left for each problem type. Each bar is 
segmented to show the number of cognitive steps for canonical 
columns (bottom segment), cognitive steps for non-canonical 
columns (second segment from bottom), memory steps (third 
segment from bottom), and the number of eye and hand movements 
(top segment). 
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Discussion 

The simulations of the regrouping and augmenting algorithms 
teach us several lessons. First, the difference between the *wo 
algorithms with respect to cognitive complexity is small in 
magnitude. Since the two algorithms are derived from different 
mathematical ideas, it is not obvious why this is so. Closer 
reflection reveals the reason. Both the law of associativity and the 
constant difference law are instances of a more general law which 
says that a quantity remains constant if every change in it is 
compensated by a corresponding counterchange. The structure of 
this law implies that the goal structure of the corresponding 
algorithm will contain two main subgoals: a change goal and a 
compensate goal. This is indeed the case for both algorithms. 
Furthermore, the internal structure of each change or compensation 
is always the same: Cross out a digit, compute the replacement 
digit, and write the replacement digit. Since the structure of the 
goal tree is similar in both algorithms, the number of cycles of 
operation is nearly equal. This equality is, in a sense, accidental. In 
general, there is no reason to expect different mathematical laws 
to generate algorithms with similar goal structures. 

Second, the simulations show that the differences between the 
two algorithms have different directions on different types of 
problems. There is no difference on canonical problems. The 
difference is in favor of augmenting on problems which have non- 
canonical columns but no blocking zeroes. The difference is in favor 
of regrouping on problems which have two or more blocking zeroes. 
The implication of this result is that empirical measures of the 
cognitive complexity of the two algorithms will depend on the 
composition of the test. A test without blocking zeroes will favor 
the agumenting algorithm, but a test with many blocking zeroes 
will favor regrouping. In a mixed test the differences will tend to 
cancel each other. Unfortunately, some of the pre-Wcrld War II 
studies did not specify which subtraction problems were used to 
measure the students' performance. 

The outcome of the simulation runs are consistent with the 
pattern of empirical results in the literature. If there are only 
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small differences, and if those differences go in different 
directions for different classes of problems, then we would expect 
empirical measurements to give inconsistent results. Sometimes 
one algorithm should appear to be easier, sometimes the other, due 
either to the composition of the test problems or to sampling error. 
This is exactly what the literature shows [5, 6]. 

These simulations imply that it does not matter which 
algorithm is taught. Regrouping and augmenting are equally 
complicated; the differences in cognitive complexity are too small 
to be of pedagogical significance. This conclusion is consistent 
with the fact that both algorithms are, in fact, taught in different 
school systems, without noticable higher degree of success in one 
system than in ihe other. However, the study summarized in this 
section (and reported in more detail in [8]) onl*' concerned the 
execution of the two algorithms. The two algorithms are equally 
complex to use, once learned. But Brownell's argument was that 
regrouping is easier to learn than augmenting, at least if 
subtraction is taught conceptually. We therefore need to investigate 
the cognitive complexity of the construction (as opposed to 
execution) of the two algorithms. In addition, we need to compare 
the cognitive complexity of the construction under both conceptual 
and mechanical instruction. 

Conceptual versus Mechanical Instruction 

As mathematics educators deepen their analysis of 
mathematical cognition, they become more and more concerned with 
the question of conceptual understanding [15]. This concern is partly 
fuelled by research into childrens' mathematical errors. Catalogues 
of error patterns have been compiled for a number of mathematical 
tasks, including subtraction [4, 38, 39, 41] and fractions [9, 14, 17, 
27, 31, 36, 37]. Most of the error patterns described in these 
catalogues are senseless; they have no disenable relation to the 
correct mathematical operations. To observe children making 
senseless mistakes is a frustrating experience, and it is impossible 
not to believe that if children only understood what they are doing, 
they would not make those mistakes. Following this line of 
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reasoning, mathematics educators have tried to design conceptually 
based instruction in arithmetic. 

Does conceptual understanding help? 

The purpose of many instructional interventions in arithmetic 
is to show that if children are taught the conceptual rationale for 
the arithmetic algorithms, they will have less difficulty in learning 
those algorithms, and their performance will be less error prone 
and more flexible in response to changing task demands [15]. 
Unfortunately, this enterprise has not been spectacularly 
successful. 

A training study by Resnick and Omansson can serve as an 
example [33]. Children with faulty subtraction performance were 
taught the conceptual rationale of the regrouping algorithm with 
the help of Diene's blocks. The instruction was designed to force 
children to map back and forth between blocks and numbers. The 
children first performed a step with the blocks, and then performed 
the same step with the symbols. At the end of the instruction, 
several of the children could explain the correct subtraction 
procedure. When they were given subtraction problems to perform, 
they nevertheless made errors. As a second example, Ohlsson, Bee, 
and Zeller taught children how to add fractions with an interactive 
computer tool that enabled children to switch back and forth 
between graphical and numerical representations of fractional 
quantities [27]. A change in one representation was automatically 
mirrored by the corresponding change in the other representation. * 
detailed analysis of the children's performance on the pre- and 
posttests revealed that they could map back and forth between the 
fraction symbol x/y and concrete representations of fractional 
quantities. All of them nevertheless committed the standard error 
of adding fractions by adding both numerators and denominators on 
the posttest. In both of these studies, instruction that was 
carefully designed to make the meaning of the mathematical 
operations evident failed to prevent or cure senseless errors. 
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These empirical failures focus attention on the lack of 
theoretical analysis of conceptual understanding in the context of 
arithmetic. What is meant by conceptual understanding, and what is 
its (supposed) function in procedural learning? How does conceptual 
instruction interact with the construction of a mental procedure? 
Why should we believe that knowledge of the rationale of an 
arithmetic procedure facilitates the learning of that procedure? In 
spite of the recent emphasis on conceptual understanding in 
arithmetic instruction, little effort has been spent in answering 
these questions. 

My approach to these questions is to extend the HS architecture 
with a learning mechanism that enables the model to learn 
procedures on the basis of instruction. The instruction is modeled 
as a set of declarative knowledge » i that the user gives to the 
system. Such a learning mechanism enables us to teach the model 
how to do subtraction. We supply the system with a set of 
declarative knowledge units which correspond to the instructions a 
teacher would give a student, and the system learns by converting 
those knowledge units into a cognitive skill, i. e., into production 
rules. By giving the system different sets of declarative knowledge 
units, we can simulate the effects of different ways of teaching 
subtraction. In particular, we can compare conceptual instruction 
with mechanical instruction. 

Making HS teachable 

In a production system architecture, a learning mechanism is 
any process that can revise existing production rules or generate 
new ones. When a new rule is added to a production system, the 
behavior of the system changes. The new rule will control behavior 
in those situations in which it matches working memory. Since the 
new rule is different from previous rules, the system's behavior 
will be different. The fact that the behavior changes is the main 
reason to regard the generation of new rules as a simulation of 
(procedural) learning. 



21 



i 

I 



ERIC 



Artificial Instruction 

A number of simulation systems model procedural learning as 
the construction of new production rules (see. e. g., [1, 3. 16, 18 
19, 24, 39]). These models simulate learning by doing, i. e., they 
model the effects of practice. In spite of their differences, they all 
instantiate the same abstract theory. The first principle of this 
abstract theory is that humans have access to one or more weak 
problem solving mechanisms (analogy, hill climbing, planning 
search, etc.) which can generate task oriented behavior on 
unfamiliar problems. The second principle is that information about 
each problem solving step-the reasons for taking it the 
desirability of the outcome, the temporal order of the steps, and so 
on-.s stored in long-term memory. The third principle of the 
abstract theory is that the learning mechanisms construct new 
rules through some form of induction over the individual steps For 
example, the SAGE system described by Langley carries out forward 
search and stores steps in which a particular action had good 
outcomes, as well as steps in which that action had bad outcomes 
[20 . The system learns by identifying one or more situation 
features that discriminate between the two classes of situations 
and ,t constructs a new rule by incorporating those features into 
the rule that controls that action. Different models of learning by 
doing differ with respect to which weak methods they postulate 
which information they assume is stored in memory, and which 
induction procedure they use, but they all instantiate the three 
abstract principles stated above. 

Simulation models that instantiate the abstract theory of 
learning by doing are quite successful in modeling the effects of 
practice. But models of practice are not sufficient for present 
purposes. There is nothing in such systems that correspond to 
instruction, i. e., to a set of messages that originate outside the 
system and which are used to construct new procedural knowledge 
A learning mechanism which is to simulate learning from 
instruction must take declarative knowledge units among its inputs. 

• l V h f^ S SyStem ' 9eneral world kno »"«lfle. including knowledge 
imparted by instruction, is assumed to consist of constraints on 
cognitive processes. For example, the laws of the number system 
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impose constraints on arithmetic operations. Unless an addition 
procedure yields the same result for (a + b) + c as for a + (b 4 c), i. 
e., unless it satisfies the constraint imposed by the associative 
law, it is not a correct addition procedure. The notion of general 
knowledge as constraints is not limited to arithmetic, or, indeed, to 
mathematics. For example, the laws of conservation of energy, 
mass, and momentum are examples of natural science principles 
which are naturally cast as constraints. Traffic laws are good 
examples of constraints in everyday life. I do not claim that all 
general knowledge can be formulated as constraints, only that 
constraints is one important form of knowledge, a form, moreover, 
which is particularly relevant to arithmetic. In the HS system, 
constraints are encoded in knowledge elements which are distinct 
from both working memory elements and from production rules. 

An incorrect or incomplete arithmetic procedure typically leads 
to results that violate one or more of the relevant constraints. For 
example, an incorrect or incompieto regrouping procedure might 
violate the constraint that the value of the subtrahend is to remain 
constant over regrouping. The basic idea behind the HS system is 
that a constraint violation contains information about how to revise 
the faulty procedure so that similar constraint violations are 
avoided in the future. In each operating cycle, the system matches 
all available constraints against the current state of affairs. If a 
constraint is satisfied, no action is taken. If one or more 
constraints are violated, the learning mechanism is triggered. This 
corresponds to having a tutor who watches a problem solution and 
provides instruction when needed. (The HS system is given all the 
constraints at the beginning of the simulation run, rather than 
single constraints--instructions-at select points during a problem 
solving. Since the system effectively does not 'see* a constraint 
until it is violated, this difference to real tutoring is less 
significant than it first appears.) The learning mechanism analyzes 
the constraint violation, and revises the faulty rule accordingly. The 
technical details of the learning mechanism are not important for 
present purposes. A detailed description of the HS learning 
mechanism is available in [28, 29]. 
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Since learning happens when the behavior of the system causes a 
constraint violation, there must be some initial rules which can 
generate behavior. HS must be supplied with at least one initial rule 
for each problem solving operator. In the simulation runs reported 
in this chapter, the initial rules are minimal, i. e., their condition 
sides contain only the applicability conditions for the relevant 
action. These incomplete rules generate almost random behavior. 
Each action is considered in every situation in which its 
applicability conditions are satisfied. The probability of causing a 
constraint violation is high. The system detects the violation, 
revises the faulty rule, and then starts over on the problem. The 
cycle of trying to solve the problem, detecting a violation, revising 
the faulty rule, and starting over continues until the problem can be 
solved without any constraint violations. This is a reasonable first 
approximation model of learning to solve problems under tutelage. 

In summary, the HS system encodes declarative knowledge, 
including instructions, as constraints on behavior. In arithmetic, 
the effect of faulty or incomplete procedural knowledge is typically 
to generate results that violate the constraints imposed by the 
laws of numbers. HS learns by analyzing a constraint violation and 
revising the rule that caused the violation in such a way that 
similar constraint violations are avoided in the future. This 
capability makes HS teachable: To teach HS a particular procedure, 
the user supplies the system with an initial set of (incomplete) 
rules and the constraints that define the correct procedure. Each 
constraint corresponds to an instruction. The system tries to solve 
problems, makes mistakes, and learns from the instructions it has 
been given. If the instructions are complete enough, the system will 
eventually arrive at the correct procedure. 

leaching HS subtraction 

The HS system was taught both the regrouping and the 
augmenting algorithms for subtraction, and both algorithms were 
taught in two different ways, corresponding to conceptual and 
mechanical instruction. This subsection describes the inputs to the 
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four simulation experiments, and the next subsection describes the 
results. 

What does it mean to do subtraction procedurally, as a 
mechanical skill? A person who does subtraction mechanically is 
not thinking about the mathematical objects--the numbers- 
symbolized by the digits in the problem display, nor about the 
mathematical relations between those numbers. For example, 
he/she does not think about the fact that the "3" in the numeral "32" 
denotes the number 30. Instead, he/ she thinks about the digits 
themselves. He or she performs crossing out and writing actions on 
the physical display (i. e., the paper) without considering the 
mathematical meaning of those actions. 

Consistent with this interpretation of what it means to do 
subtraction mechanically, HS was supplied with a representation of 
a subtraction problem that was isomorphic to the information 
available in a standard problem display (vertical format). The 
representation contained information about which digits occurred in 
which spatial arrangement, but little else. In particular, there was 
no representation of the place values of the different digits, nor of 
the current value of either the subtrahend or the minuena. In this 
representation, a subtraction problem appears as two strings of 
digits. The representations for the regrouping and the augmenting 
algorithms were very similar. 

If the learner thinks of a subtraction problem in terms of 
physical operations on the digits in the problem display, he or she 
cannot benefit from conceptual instruction. For example, 
instructions that mention the place value of a particular digit can 
have no impact on a learner who has not internally represented that 
place value. There is nothing for such an instruction to relate to. 
The constraints we supplied to HS in the mechanical case were 
shallow and superficial. They were not derived from the laws of the 
number system, and they did not mention the conceptual or 
mathematical meaning of the operations involved. 
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What does it mean to do subtraction conceptually? The learner 
who does subtraction conceptually thinks about the numbers 
symbolized by the digits in the problem display, and he/she is 
aware of the mathematical interpretation of the actions performed 
on that display. Consistent with this view, the HS representation 
for conceptual learning was very different from the HS 
representation for mechanical learning. In the conceptual 
representation, a subtraction problem is encoded at the top level as 
a difference between two numbers. The subtrahend and the minuend 
are both associated with particular additive decompositions, i. e., 
sets of numbers that add to those numbers. The elements of the 
additive decompositions are associated with a face value and a 
place value. In the conceptual representation, the distinction 
between numbers and digits is explicit, and the face values of the 
additive components are associated with the digits in the problem 
display. The operations of crossing out and writing digits 
correspond to internal, mental operations on the numbers 
symbolized by those digits. The representations for the regrouping 
and augmenting algorithms were once again very similar. 

In addition to the representation of the problem and the 
constraints, HS must also be given some initial procedural 
knowledge. Without initial rules HS cannot generate behavior, and so 
cannot discover constraint violations. In the simulation runs 
presented in this subsection, HS was given the corect procedure 
for canonical subtraction problems, i. e., problems in which the 
minuend digit is larger than the subtrahend digit in every column. 
The system learned to solve no/7-canonical problems, i. e., problems 
for which the minuend digit is larger than the subtrahend digit in at 
least one column. In common parlance, the system learned to 
'borrow'. I shall refer to this process as canonicalization, since the 
purpose of 'borrowing' is to bring a non-canonical problem onto 
canonical form. In summary, the system learned two different 
canonicalization methods, regrouping and augmenting, with two 
different representations of each method. 

In each training run the system tries to solve its current 
problem. Since the rules for canonical problems cannot handle non- 
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canonical problems, the system commits mistakes. The mistakes 
are identified by the constraints, and the system applies its 
learning mechanism to revise the rules. It then starts over. 
Eventually it learns to solve the problem correctly. If the system is 
given a second training problem, it may or may not solve that 
problem correctly. It depends on the relation between the training 
problems. If it fails to solve the second training problem correctly, 
it revises its procedure further. In the simulation runs reported 
below, the system was fed successive training problems until it 
arrived at the correct subtraction procedure. The number of training 
problems required varied between two and four, depending on 
condition. The correctness of the learned procedure was verified by 
running it on the 66-item subtraction test described earlier in this 
chapter. 

Computational results 

Table 2 shows the amount of computational work required to 
learn to canonicalize in each of the four conditions, summed over 
all training problems in each condition. It contains several 
interesting effects. First, the regrouping models require more 
learning to handle columns with blocking zeroes than columns 
without blocking zeroes. The augmenting models, on the other hand, 
are not affected by blocking zeroes. Second, regrouping is 
computationally more expensive than augmenting. The only 
exception is that if we disregard blocking zeroes, then regrouping is 
easier to learn than augmenting with a mechanical representation. 
Third, conceptually based learning is more complex than mechanical 
learning for both regrouping and augmenting. Also, the difference 
between the conceptual and the mechanical representations is 
larger in the case of regrouping than in the case of augmenting. The 
conceptual regrouping model required 2.3 as many cycles as the 
mechanical one, while the conceptual augmenting model required 1.3 
as many cycles as its mechanical counterpart. Finally, it makes no 
difference whether we measure the computational complexity by 
the number of cycles or by the number of search states visited 
during learning. All effects mentioned here occur in both variables. 
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Table 2. The amount of computation required by the HS model to 
learn to canonicalize under four different conditions, measured both 
in terms of the number of search states visited and the number of 
production system cycles required. 



Type of representation 



Conceptual Mechanical 



Algorithm learned States Cycles States Cycles 



Regrouping 

No blocking zoroes 968 940 464 449 

Blocking zeroes 1843 1815 828 794 

Augmenting 

No blocking zeroes 889 862 689 687 

Blocking zeroes 889 862 689 687 



It is, of course, possible to question the psychological relevance 
of both the number of production system cycles and the number of 
search states visited. Both measures are heavily dependent on the 
theoretical assumptions behind the simulation model. If the human 
learner is not doing search, or if human cognition is not a 
production system architecture, there might be no relation between 
these measures and measures of cognitive work in humans. In 
addition, both measures depend on the particular implementation of 
the four simulation models. 
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But the complexity of the four learning processes can also be 
measured in terms of the number of learning events and the number 
of rules learned. A learning event is an event in which the system 
discovers a constraint violation, and revises its current rule set. A 
learning event might lead to the construction of one or more new 
rules. The number of learning events required is not primarily a 
function of the theoretical assumptions behind the models or of the 
implementation details. It is a measure of how many 'things' there 
are to learn before the correct procedure has been acquired; it is 
primarily a function of the logic of the learning task. 



Table 3. The amount of lear. ? ng required by the HS model to learn 
to canonicalize under four different conditions, measured both in 
terms of the number of learning events required and the number of 
new rules created. 



Type of representation 



Conceptual Mechanical 



Algorithm learned Events Rules Events Rules 



Regrouping 

No blocking zeroes 23 35 16 23 

Blocking zeroes 32 50 24 32 

Augmenting 

No blocking zeroes 20 29 1 8 24 

Blocking zeroes 20 29 1 8 24 
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Table 3 shows the amount of {earning required to master 
regrouping and augmenting, measured in terms of the number of 
learning events as well as the number of new rules learned. All the 
effects observed in Table 2 are reproduced in Table 3: Regrouping is 
more complex to learn than augmenting (except for problems 
without blocking zeroes, in the mechanical representation), the 
conceptual versions require more learning than their mechanical 
counterparts, and the difference between the conceptual and the 
mechanical versions is larger in the case of regrouping than in the 
case of augmenting. All effects appear with both measures. The 
main difference between Tables 2 and 3 is that both the absolute 
values and the relative size of the various effects are smaller. 



Table 4. The amount of instruction required by the HS model to 
learn to canonicalize under four different conditions, measured both 
<n terms of the number of constraints (instructions) required and 
the number of training problems needed. 



Type of representation 



Conceptual Mechanical 



Algorithm learned Constraints Problems Constraints Problems 



Regrouping 31 4 21 4 

Augmenting 25 2 20 5 



Table 4 shows yet another way to measure the outcome of the 
simulation experiments. Instead of measuring the amount of 
learning, Table 4 measures the amount of instruction needed to 
teach the HS model the two subtraction algorithms. The amount of 
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instruction is measured in terms of how many constraints-- 
instructions--we had to provide HS with in order to bring it up to 
correct performance. All the relevant effects from the other tables 
are reproduced in this variable. Regrouping requires more 
constraints than augmenting, and the difference is larger in the 
conceptual than in the mechanical case. 

The amount of instruction can also be measured in terms of the 
number of training problems needed to bring the model up to correct 
performance. This measure shows a different pattern: With respect 
to regrouping, the number of training problems is the same for both 
conceptual and mechanical representations. Augmenting requires 
one more training problem than regrouping in the mechanical 
representation. Finally, to learn augmenting with the conceptual 
representation requires only two training problems, the lowest of 
the four measures. This is the only case where the conceptual 
representation has an advantage. The number of training problems is 
a coarse measure of the complexity of the learning processes 
involved, and this result carries little weight against the 
consistent pattern across the five other measures. 

Discussion of substantive conclusions 

The results from the learning runs imply, briefly put, that 
regrouping is more difficult than augmenting, and that learning 
subtraction conceptually is more difficult than learning it 
mechanically. Since these results go against current wisdom in the 
mathematics education community, it is natural to ask what 
confidence we can place in them. The simulation model that 
produced these results might not be an accurate model of human 
learning. There is the possibility that the production system 
hypothesis is wrong. Also, the particular learning mechanism 
implemented in HS might not correspond to any type of learning that 
humans do. In either case, we would have to admit that HS does not 
simulate human performance or learning. The reluvance of the 
computational results to instruction is then doubtful. 
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Another possibility which would lessen the relevance of the 
computational results is that the production system hypothesis is 
correct, but HS is the wrong implementation of it. Simulation 
models are always underdetermined by the theories they embody 
{26]. There is always the possibility that the computational results 
depend upon this or that technical detail of the implementation It 
would clearly be capricious to base instruction on results which 
depend on programming style. 

Although both of these objections to computer simulations are 
valid in principle, I believe that the particular computational 
results reported here are principled. The effects in Tables 2 through 
4 are not caused by this or that exotic feature of the 
implementation of HS, but by the fact that the gap between 
principles and procedures in arithmetic is wide, much wider than 
the intuitions of mathematically literate people suggest. To support 
this claim, I will discuss three aspects of that gap: the role of 
spatio-temporal relations, the function of expediency in algorithm 
design, and the importance of attention allocation. 

The role of spatio-temporal relations. Equality relations 
between quantities are timeless and without spatial interpretation. 
For example, the associative law 

(a + b) + c ■ a + (b + c) 

states that the sum of any two numbers x and c, where x is the sum 
of any two numbers a and b, is equal to the sum of the two numbers 
a and y, where y is the sum of b and c. The law does not say anything 
about spatial locations or directions. The fact that the law has a 
left-to-right linear structure is a property of the paper medium. If 
the law was encoded as a list-structure in a computer, the 
individual symbols might be distributed in a very different spatial 
pattern, but the law would have the same meaning. Neither does the 
law speak about temporal order. The addition operations mentioned 
in the law are not related through relations such as before and 
after, and concepts like first, next, and last have no role in the 
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understanding of the law. The laws of the number system express 
equality relations abstracted from time and space. 

The control of action, on the other hand, is all about spatio- 
temporal relations. The main function of an algorithm or a problem 
solving procedure is to order primitive actions in time, to regulate 
which action is to be done before or after which other action. 
Furthermore, the actions, to the extent that they are motor actions, 
have to be performed at some particular location in space, on some 
particular object. If a digit is to be crossed out, the spatial 
coordinates for that object must be known. If the right action is 
performed in the wrong spatial location, an error is likely to result. 
To learn a cognitive skill is to acquire a structure for the spatio- 
temporal control of action. 

If the mathematical structure-the set of laws that constitute 
the rationale for a particular algorithm-ignores time and space, 
and if the cognitive skill involved in executing that algorithm is a 
structure for spatio-temporal organization, it follows that the 
mathematical structure does not fully determine the skill. One 
cannot derive that this action has to be performed before that 
action from mathematical laws which do not speak about temporal 
relations; one cannot direct an action to this spatial location rather 
than that with the help of laws which do not speak about space. 
Information about time and space has to be added to the 
mathematical principles in order to control action. Knowledge about 
the mathematical rationale for an algorithm is not sufficient for 
the construction of the algorithm. 

The role of expediency in algorithm design. The belief that 
mathematical principles determine mathematical action ignores the 
role of expediency in the design of the place value algorithms. Why, 
for example, do we solve place value problems by processing the 
columns in order from lower to higher place values? There is no 
mathematical reason for this rule. It is equally correct to begin 
subtracting to the left, i. e., with the highest place value column, 
and work towards the right, i. e., towards columns with lower place 
values. Unlike the standard procedure, this alternate procedure, 
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although mathematically correct, requires that the already 
processed columns have to be processed again every time the 
minuend is regrouped. Beginning with the lowest place value column 
saves work; it is a choice dictated by expediency, not by 
correctness. Indeed, there is no mathematical reason to regroup in 
the first place. It is possible to perform subtraction by processing 
each column independently of the others, recording negative results 
when appropriate, and then combining the column results into the 
final answer. The decision to regroup is dictated by economy 
considerations, not by mathematical principles. 

The place value algorithms evolved over a long period of time as 
efficient means of performing calculations. The main reason to 
adhere to those algorithms is that they save work, as compared to 
other, equally correct procedures. But there is no relation between 
the mathematical theory of place value and the expediency of the 
algorithms that build on it. One cannot derive that this way of 
doing subtraction is more efficient than that way from the laws of 
the number system. The shape of these algorithms is not determined 
by the underlying mathematical principles, so understanding those 
principles contributes little to the learning of the algorithms. Any 
aspect of a procedure which is grounded in expediency rather than in 
mathematical concepts and relations will appear arbitrary and 
incomprehensible regardless how well the conceptual rationale for 
that procedure is understood. 

School children cannot be aware of the expediency of the place 
value algorithms, in order to realize how economical they are, one 
must have something to compare them to. Since children are taught 
the efficient algorithms, they have no experience of less efficient 
ways of doing calculations. Also, since children are not doing 
calculations for a living, they have no interest in expediency. 

The importance of attention allocation. One of the most robust 
findings of cognitive psychology is that there are severe limits on 
how much information can be kept in working memory at any one 
point in time. This limitation is simulated in the HS system by 
letting working memory elements decay as time passes. The main 
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consequence of this limitation is that the control of attention is a 
central issue in all action, including mathematical action. If you 
cannot keep all information in the problem display in your head 
simultaneously, then you have to access it sequentially, by moving 
your eye over it in a carefully controlled manner. To learn 
subtraction is to learn where to look. Obviously, mathematical 
principles have nothing to say about this aspect of mathematical 
action. No matter how well one understands the concept of place 
value, one still has to figure out where to look at each moment 
during subtraction. 

In summary, there are at least three principled reasons to 
believe in a wide derivational gap between mathematical principles 
and mathematical action. First, mathematical principles ignore 
questions of space and time, while a cognitive procedure is a 
structure for the spatio-temporal control of action. Second, 
mathematical principles ignore the cost of computing a result, 
while the standard place value algorithms are designed for 
maximum expediency. Children cannot understand those features of 
place value algorithms which are designed with expediency in mind, 
because they have no experience of the less expedient alternatives; 
and, unlike the professional calculators who developed the 
algorithms, children have no particular interest in economy. Third, 
the limited capacity of human working memory implies that ail task 
information cannot be kept active at ail times. Consequently, any 
cognitive skill must specify how attention is to be allocated over 
the task information. But mathematical principles have nothing to 
say about the allocation of attention. 

If the gap between mathematical principles and mathematical 
action is as wide as the above discussion suggests, then how is an 
understanding of the mathematical concepts and principles 
underlying a particular algorithm supposed to facilitate the 
construction of the cognitive skill? This question has not been 
clearly answered by any current theory of mathematical cognition, 
and I suggest that no answer exists. The gap between mathematical 
knowledge and mathematical action is difficult to bridge; that is 
why it took two millennia to develop the place value algorithms, 
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and that is why school children make mistakes even after they have 
grasped the rationale of an algorithm. 

The fact that the derivational distance between mathematical 
principles and mathematical action is large does not in and of itself 
explain why the HS model needs to compute more in the case of a 
conceptual representation than in the case of a mechanical 
representation. Granted that the derivational distance is large, we 
still need an explanation for why it is larger in one case than in the 
other. The explanation is simple: There is more work involved in 
updating and processing a rich representation than an impoverished 
one. There are more relations to keep track of, and therefore more 
operations to perform. Each of those operations has to be controlled 
by some procedural rule; hence, there are more rules to learn, or 
more complicated conditions for the rules. The same must be true of 
humans; updating and maintaining a richer mental representation 
must require more cognitive work. 

Because the gap between the mathematical principles and the 
mathematical procedures is so wide, I believe that any reasonable 
simulation model of knowledge-based acquisition of an arithmetic 
procedure will reproduce the results reported here. The reader who 
disbelieves this is urged to prove me wrong by developing a 
simulation model that can learn subtraction both conceptually and 
mechanically and which expands less computation in the former 
case than in the latter. 

According to the results reported here, William Brownell could 
not have been more wrong. Regrouping is more difficult to learn 
than augmenting. In particular, regrouping in a conceptually rich 
representation is more difficult to learn than regrouping done 
mechanically, and the disadvantage of the conceptually rich 
representation as compared to the mechanical case is much larger 
for regrouping than for augmenting. These results directly 
contradict BrowneJPs conclusion that regrouping is easier than 
augmenting, particularly when taught conceptually [5, 6]. 
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At first glance, this contradiction seems devastating for the 
model. After all, BrownelPs conclusion was based on empirical 
observations, and in the case of a contradiction between theory and 
data, it is the theory that must go. However, unlike simulation 
studies, empirical studies cannot differentiate between learning 
and performance, between the amount of cognitive work needed to 
learn an algorithm and the amount of cognitive work needed to 
execute it, once learned. The only way to measure the cost of 
learning is to observe performance, so any empirical measure will 
necessarily confound the two. As the reader might recall, the 
simulation of performance in the first study reported in this 
chapter did produce results which fit the empirical data rather 
well, it is reasonable to interpret those data as measures of the 
cognitive cost of executing the algorithms rather than of the 
cognitive cost of learning them. We then have a good fit between the 
theory and the data themselves, but no support for Brownell's 
interpretation of the data. 

The result that conceptual instruction requires more 
computational work than mechanical instruction is comforting to 
the researcher who desperately wants to know why well-intended, 
carefully planned and skillfully executed instructional 
interventions that aim to impart conceptual understanding do not 
succeed in producing correct performance [27, 33]. But it is less 
comforting to the educator or teacher who is responsible for 
designing efficient instruction. The simulation results imply that it 
is a mistake to expect conceptual understanding to facilitate 
procedural learning, instead, the results indicate that conceptually 
based instruction will be more costly in terms of time and effort 
than mechanical instruction. The relation between the conceptual 
rationale of an arithmetic procedure and the procedure is an 
instructional topic in its own right, a topic, moreover, which is 
complicated and therefore requires time and effort on the part of 
both instructor and student. Instead of being a tool for teaching (the 
same old) arithmetic, conceptually based instruction in arithmetic 
constitutes a higher pedagogical ambition, as compared to 
mechanical instruction. 
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It is easy to feel sympathy with this higher ambition. We 
obviously want students to grasp the rationale behind the 
arithmetic algorithms. The present discussion is not meant to imply 
that conceptual instruction in arithmetic is wrong or undesirable. 
What is wrong is the expectation that such instruction can be 
digested easier and with less effort than mechanical instruction. 

Conceptually based instruction in arithmetic might need to 
revisit the idea of a spiral curriculum [7. PP- 52-54]: Teach the 
algorithms with a small amount of conceptual interpretation at an 
early age; teach them again with a deeper presentation of the 
conceptual rationale when the students have aquired more 
mathematical knowledge; and so on. The topic could be visited as 
many as four our five times between third grade and college, each 
visit probing deeper into the conceptual rationale, until the 
students are able to carry out a relatively tight derivation of the 
algorithms (e. g., as in [25]). To the best of my knowledge, no large 
scale empirical evaluation of such a spiral curriculum for 
arithmetic has yet been done. 

Evaluation of the General Method 

The specific conclusions about arithmetic instruction presented in 
this chapter are controversial and unlikely to be accepted without a 
debate. Such a debate would be welcome. But the controversial 
nature of the domain-specific conclusions should not be allowed to 
obscure the fact that the present study also contributes a general 
method with a potentially greater impact. 

The main method of traditional educational research is well 
exemplified by the studies conducted in order to choose between the 
regrouping and augmenting algorithms: To determine the relative 
advantage of an instructional design A as compared to an 
alternative design B, teach one set of students with design A and a 
second set of students with design B, and compare the outcomes. 
This empirical method is laborious and time consuming. In addition, 
it is rarely successful in settling the instructional issue at hand. 
Measures of instructional outcomes are so imprecise and coarse 
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that a negative outcome is unconvincing. The opponents of the 
hypothesis favored by the author of such a study can always feel 
justified in questioning whether the measures used were sensitive 
enough to register even quite significant effects. On the other hand, 
a positive effect is equally unconvincing. An observed effect cannot 
be ascribed to the instructional intervention with any certainty, 
because it is almost impossible to achieve control over all the 
determinants of an instructional outcome. Empirical comparisons 
between alternative instructional designs carry little intellectual 
authority, regardless of outome. 

Teachable simulation models enable an alternative method for 
investigating instructional questions. Instead of teaching the 
relevant instructional topic in different ways to different groups of 
students, we can teach it in different ways to a model of learning, 
if that model takes the form of a robust, runnable simulation. The 
simulation runs provide us with measures of the amount of 
computational work required to learn the target topic under 
different modes of instruction. A significantly lower value for mode 
A than for its rival B constitutes a prediction that A is the 
preferred way of teaching the target topic. 

Using this method, an instructional designer can invent a new 
approach to a particular topic, use it to teach that topic to the 
model, and have a preliminary outcome, all in a matter of days. 
Preparing the inputs (the initial procedural knowledge and the 
instructions) to a teachable simulation model is not a trivial task, 
but it is measured in hours or days, rather than in months or years. 
Such rapid turnaround between an instructional idea and its 
evaluation has the potential to facilitate search through the space 
of instructional designs [30]. Many different designs can be tried 
and compared at a relatively low cost and in a relatively short time. 

A teachable simulation model can also help identify fruitless 
questions and inappropriate techniques. Consider once again the 
large scale classroom studies of the pre- Wo rid War II era that 
attempted to settle the controversy between regrouping and 
augmenting empirically. My simulation results show that there is no 
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reason to expect any differences between regrouping and 
augmenting on measures of performance. The two algorithms are 
nearly equal in cognitive complexity, once learned. Hence, trying to 
measure the difficulty of the two algorithms by measuring 
performance is not a useful endeavor. The differences between the 
algorithms only affect the amount of cognitive work required to 
learn the algorithms. But pure empirical measures of learning are 
hard to come by. One possibility is to count the number of learning 
events per unit time as revealed by think-aloud protocols, a 
measure hardly ever used in learning research (but see [40] for an 
exception). No such measure was employed in the pre-World War II 
studies that compared regrouping and augmenting. Those studies 
could not, in principle, resolve the issue they were addressing, 
because they were approaching it with the wrong tools. Theoretical 
clarification is a necessary prerequisite for meaningful data 
collection in instructional science as in other sciences. 
Implementing and using a teachable simulation model is one way to 
achieve such clarification. 

A second traditional approach to instructional design, over and 
above empirical comparisons between alternative teaching methods, 
is to base particular decisions on general design principles, which, 
in turn, are derived in some more or less intuitive way from a 
learning theory. The debate about how to teach subtraction could 
conceivably be decided by the application of such a principle. For 
example, we could apply the principle of successive elaborations: A 
topic should be taught by first presenting a kernel idea, an epitome, 
which is then successively elaborated [32]. But this principle does 
not discriminate between the different ways of teaching 
subtraction. Both regrouping and augmenting can be taught by first 
presenting the basic idea of the algorithm, and then elaborating it. 
As a second example, consider the principle, proposed by Anderson, 
Boyle, Farrell, and Reiser, that one should teach the goal hierarchy 
of the target skill [2]. Once again, this principle does not 
discriminate between alternative subtraction algorithms. As a last 
example, a colleague of mine suggested that one should prefer 
regrouping over augmenting on the principle that teaching should 
facilitate future learning, and the regrouping operation is more 
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generally useful than the augmenting operation. But it is unclear in 
what sense the law of associativity is more generally useful than 
the constant difference law; both seem equally necessary for 
continued study in mathematics. In short, the disadvantage of using 
general design principles as mediators between theories of learning 
and instructional designs is that the application of those design 
principles is seldom straightforward. 

The method of teachable simulation models links learning theory 
to instructional design in a different way. The method brings 
learning theory to bear on particular issues, without mediation by 
general design principles. For example, the simulation runs 
presented in this chapter tell us that augmenting is easier to learn 
than regrouping and that the advantage of augmenting is increased 
with conceptually based instruction. The simulation runs resolve 
the particular issue of regrouping versus augmenting, but they do 
not suggest any principle of arithmetic instruction, let alone any 
general design principle. The principles of learning embedded in the 
model are applied directly to the instructional issue at hand. 
Whether this is, in general, a better way to proceed than via general 
design principles cannot be determined here. The two different 
ways of linking learning theory to instructional design are not 
incompatible. A mixture of both approaches will probably prove 
most advantageous. 

Testing instructional designs by trying them out on a simulation 
model seems to presuppose that we have accurate simulation 
models. There are three answers to this objection. First, the lack of 
accuracy of today's models and theories is a temporary 
disadvantage. As research into human learning progresses, we will 
be able to construct more accurate theories. It is desirable to have 
a method which allows us to channel increased theoretical 
understanding into improved instructional designs. The dependence 
on the accuracy of our learning theory is not (only) a bug, it is (also) 
a feature. Second, the extent to which particular computational 
results depend upon the accuracy of the model is a matter for 
debate. In the proceeding section I argued that the results reported 
in this chapter are consequences of deep features of arithmetic, and 



41 



Artificial Instruction 3 9 

hence relatively independent of the particulars of the HS model. (It 
is clear how to provide evidence for or against claims of this kind: 
A claim about independence of results from a particular model is 
supported if the results can be reproduced with a different model.) 
Third, a theory need not be entirely accurate to be useful. Even 
approximate theories can often supply information that improve 
upon common sense and rules of thumb. 

Answering questions through theoretical calculations goes 
against the grain in a discipline that was shaped in the heydays of 
the peculiar brand of empiricism advocated by the logical 
positivists. It is therefore useful to look up from our local concerns 
and observe that the ratio of theoretical calculation to empirical 
observation tends to grow as scientific disciplines mature. Once 
upon a time, geometers measured angles in order to decide whether 
a triangle was a right triangle or not. By the time Euclid wrote his 
great treatise, geometry was already a purely theoretical discipline 
in which answers to questions are derived from first principles. 
Mechanics went through a similar development. Brahe and Galileo 
needed observations, but since the "rational mechanics" of the 19th 
century, questions like how much force it takes to lift a particular 
pay load into orbit are answered by calculation, not by observation. 
If it were necessary to send up hundreds of rockets with different 
payloads and different thrusts in order to decide the issue 
empirically, space travel could never have gotten off the ground. In 
short, to observe is to confess ignorance; it is what scientists do 
when they have little or no theoretical understanding. As a science 
matures, calculations replace (some) empirical measurements. 
There is every reason to expect instructional science to develop 
similarly. The present chapter is but a small step in that direction. 
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